首页> 外文OA文献 >Building a comprehensive syntactic and semantic corpus of Chinese clinical texts

【2h】

Building a comprehensive syntactic and semantic corpus of Chinese clinical texts

机译：构建全面的汉语句法语义语料库临床文本

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Objective: To build a comprehensive corpus covering syntactic and semanticannotations of Chinese clinical texts with corresponding annotation guidelinesand methods as well as to develop tools trained on the annotated corpus, whichsupplies baselines for research on Chinese texts in the clinical domain. Materials and methods: An iterative annotation method was proposed to trainannotators and to develop annotation guidelines. Then, by using annotationquality assurance measures, a comprehensive corpus was built, containingannotations of part-of-speech (POS) tags, syntactic tags, entities, assertions,and relations. Inter-annotator agreement (IAA) was calculated to evaluate theannotation quality and a Chinese clinical text processing and informationextraction system (CCTPIES) was developed based on our annotated corpus. Results: The syntactic corpus consists of 138 Chinese clinical documents with47,424 tokens and 2553 full parsing trees, while the semantic corpus includes992 documents that annotated 39,511 entities with their assertions and 7695relations. IAA evaluation shows that this comprehensive corpus is of goodquality, and the system modules are effective. Discussion: The annotated corpus makes a considerable contribution to naturallanguage processing (NLP) research into Chinese texts in the clinical domain.However, this corpus has a number of limitations. Some additional types ofclinical text should be introduced to improve corpus coverage and activelearning methods should be utilized to promote annotation efficiency. Conclusions: In this study, several annotation guidelines and an annotationmethod for Chinese clinical texts were proposed, and a comprehensive corpuswith its NLP modules were constructed, providing a foundation for further studyof applying NLP techniques to Chinese texts in the clinical domain.

机译：目的：建立涵盖汉语临床文本句法和语义注释的综合语料库，并提供相应的注释准则和方法，并开发在注释语料库上训练的工具，为临床领域中文文本的研究提供基础。材料和方法：提出了一种迭代注释方法来训练注释者并制定注释准则。然后，通过使用注释质量保证措施，构建了一个综合语料库，其中包含词性（POS）标签，语法标签，实体，断言和关系的注释。计算了注释者间协议（IAA）以评估注释质量，并基于我们的注释语料库开发了中文临床文本处理和信息提取系统（CCTPIES）。结果：句法语料库由138个中文临床文献组成，具有47,424个标记和2553个完整的解析树，而语义语料库包括992个文献，这些文献用其断言和7695个关联注释了39,511个实体。 IAA评估表明，该综合语料库具有良好的质量，并且系统模块有效。讨论：带注释的语料库为自然语言处理（NLP）在临床领域对中文文本的研究做出了巨大贡献。但是，该语料库有很多局限性。应引入一些其他类型的临床文本以提高语料库的覆盖率，并应采用主动学习方法来提高注释效率。结论：本研究提出了针对中文临床文本的几种注释准则和注释方法，并构建了具有其NLP模块的综合语料库，为进一步研究将NLP技术应用于临床文本提供了基础。

著录项

作者
He, Bin; Dong, Bin; Guan, Yi; Yang, Jinfeng; Jiang, Zhipeng; Yu, Qiubin; Cheng, Jianyi; Qu, Chunyan;
展开▼
作者单位

展开▼
年度 2016
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. A French clinical corpus with comprehensive semantic annotations: development of the Medical Entity and Relation LIMSI annotated Text corpus (MERLOT) [J] . Campillos Leonardo, Deleger Louise, Grouin Cyril, Language Resources and Evaluation . 2018,第2期

机译：具有全面语义注释的法语临床语料库：医学实体和关系LIMSI注释文本语料库（MERLOT）的开发
2. Building a semantically annotated corpus of clinical texts. [J] . Roberts A, Gaizauskas R, Hepple M Journal of biomedical informatics. . 2009,第5期

机译：建立临床文本的语义注释语料库。
3. Syntactic parsing of clinical text: Guideline and corpus development with handling ill-formed sentences [J] . FanJ.-W., YangE.W., JiangM., Journal of the American Medical Informatics Association : . 2013,第6期

机译：临床文本的句法解析：指导和语料库开发，处理不正确的句子
4. A Joint Approach for Building a Large Tibetan Corpus with Syntactic Parsing and Semantic Role Labeling [C] . Qiu Lirong, Long Congjun, Zhao Xiaobing 2012 Fifth International Conference on Intelligent Networks and Intelligent Systems. . 2012

机译：句法解析与语义角色标注构建大型藏语语料库的联合方法。
5. Developing a Cybersecurity Text Corpus and its Application for Augmenting Semantic Text Similarity. [D] . Chavan, Manish Padmakar. 2014

机译：开发网络安全文本语料库及其在增强语义文本相似度中的应用。
6. Research and applications: Syntactic parsing of clinical text: guideline and corpus development with handling ill-formed sentences [O] . Jung-wei Fan, Elly W Yang, Min Jiang, 2013

机译：研究与应用：临床文本的句法解析：处理不正确句子的指南和语料库开发
7. Building a comprehensive syntactic and semantic corpus of Chinese clinical texts [O] . He, Bin, Dong, Bin, Guan, Yi, 2016

机译：构建全面的汉语句法语义语料库临床文本

Building a comprehensive syntactic and semantic corpus of Chinese clinical texts

摘要

著录项

相似文献

相关主题

期刊订阅